How to send your images through AI

Jovana Ugrinic05 November 2024

This chatbot is designed to analyze and respond to user-submitted images, specifically focusing on bicycles in this case scenario; you can decide on what topic to focus. The bot guides users through uploading images and gathers responses based on the image and accompanying text inputs.

The core functionality of the chatbot centers around its ability to analyze images, specifically bicycle images, using OpenAI's Vision API (GPT-4 with vision) to produce meaningful insights. This step-by-step breakdown shows how the bot processes and responds to images with integrated AI capabilities.

1. Set Up Basic Intent Structure

Intents or Blocks are the foundational units that define the bot's responses and actions. For this bot:
- There is a welcome intent/block that triggers an initial greeting and a prompt to upload an image with the following message “Insert an image, please”.
- A default fallback intent/block is usually set up to handle unrecognized inputs with prompts like “Can you rephrase your question?” BUT in this case, it is set up to prompt the user to write what they want to know about the image they've uploaded.

2. Image Analysis Using Vision API and GPT Integration

Vision API Intent: This is the primary feature where the bot analyzes the uploaded images and responds with insights.
Steps in Image Analysis:
- Image Request: Users are instructed to upload an image. The bot pauses with a message like, "Analyzing...".
- Web Request Setup: Choose the “Post” option and then insert the API: https://api.openai.com/v1/chat/completions

Prompt Design: Choose then the “Body” option where you can insert the Json-structured prompt: {
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text":
},
{
"type": "image_url",
"image_url": {
"url":
}
}
]
}
],
"max_tokens": 300
}
Authorization: Insert the attribute in this field after you've inserted the OpenAI Key in the Globals section of your Bot you'll find as the third icon on the far right of the Design Studio.

Result Processing: The API’s response is stored in a variable (result), which the bot formats and relays back to the user as a detailed bike description.

3. Sending the Image to OpenAI’s Vision API

API Call Configuration: The bot constructs an API request to OpenAI's GPT-4 Vision API. This API call is configured to send both a text prompt and the image URL to the model for processing.
API Request Structure:
- The bot first extracts the URL of the uploaded image and then incorporates this URL into the API request.
The prompt given to the model is crucial. In this case, it’s set to “Describe this bike in detail, including color, frame style, and any visible features.” A well-designed prompt helps the model return a comprehensive description.

4. Handling the Response from the Vision API

Data Extraction: The API’s response includes detailed descriptions of the bicycle, such as frame type, color, and any notable features.
Storing the Result: The bot stores the response in a variable (e.g., result) to ensure the data is readily accessible for further steps.
Error Intent: If an error occurs during API interaction, the bot captures and displays it in the error attribute.
User Feedback: The bot formats the response and sends it back to the user as a detailed description, ensuring the information is clear and user-friendly.
Optional follow-ups: After displaying the results, the bot asks if the user wants to upload another image.

5. Integration with Tiledesk and OpenAI API

API Authorization: The bot uses an API key stored securely in Tiledesk’s backend. Requests to OpenAI's API are authorized using the Authorization header.
JSON Configuration: The bot uses JSON requests to send and receive data from the Vision API, following a structured format for image and text-based queries.

The integration of Tiledesk with OpenAI’s GPT-4 Vision API is essential for powering the image analysis functionality. This section covers the key aspects of securely setting up and utilizing APIs for optimal performance.

A. Obtaining and Securing API Keys

API Key for OpenAI: The bot requires an API key to access OpenAI’s Vision API. This key is issued upon account creation and is securely stored in Tiledesk’s backend or a secure environment variable.
Authorization: When making requests to OpenAI’s API, the bot includes the API key in the request headers to authenticate each call.

B. Configuring the API Requests in Tiledesk

Tiledesk allows for custom API integrations by configuring webhook requests in the bot’s setup. In this case:
- Webhook Settings: The bot’s webhook is configured to route each user image upload request to OpenAI’s Vision API endpoint.
- API Endpoint: The API endpoint for OpenAI’s Vision model is specified in Tiledesk, allowing each request to be seamlessly routed.
Testing the API Call: Once configured, initial tests are conducted to ensure the bot successfully sends requests and receives the desired image descriptions in response.

C. Handling the API Response

The bot’s code processes the JSON response from the OpenAI API, extracting and formatting the description data to make it user-friendly.
Example of a Response Handling Snippet:
This approach ensures that users receive a clear and relevant answer based on the image they submitted.

D. Structuring the JSON Configuration for API Requests

Each API request is structured in JSON format, specifying the model, prompt, image URL, and optional settings like max_tokens to control the length of the response.
Example JSON for API Request:
This structure allows the chatbot to send specific prompts and receive detailed responses tailored to the user's query.

E. Error Handling and Fallbacks

API Error Handling: If the API fails or returns an error, the bot is configured to catch these errors and respond with an informative message to the user.
Fallback Responses: If the response does not meet the expected criteria (e.g., if it’s too vague or irrelevant), the bot can prompt the user to try rephrasing their request or uploading a clearer image.

F. Data Security and Compliance

Storing User Data: All API interactions should comply with data protection standards. User images and messages should be securely stored, anonymized if necessary, and deleted after a specified period to protect user privacy.
Using Environment Variables: API keys and other sensitive data are stored in environment variables or Tiledesk’s secure backend settings to prevent unauthorized access.

This chatbot structure efficiently guides users through image-based inquiries, responds with relevant information using AI, and ensures a smooth user experience through predefined fallback and error responses. This setup can be adapted to various applications requiring image analysis and detailed feedback.

Hi, how can we help?